High Availability Alerts

Use the information in the following tables to learn about all possible HA alerts in detail that are raised by Fault Management.

HA Service (Non-Redundant)


31050	HA Service (Non-Redundant)
Description	Send an alert when the standby node is not up which indicates that the system has no redundancy.
Preconditions	Starting with EFA 3.1.0, a timer task periodically monitors the status of the standby node, and raises an event to the fault management system. The fault management system raises an alert to the user to indicate that the system is not fully redundant. For HA events, the polling frequency is every minute.
Requirements	Alert shows the following data: Node IP The following example shows an alert when the standby node is down: <114> 2003-10-11T22:14:15.003Z xco.machine.com FaultManager - - [meta sequenceId=”47”] [origin ip=”10.20.30.40” enterpriseId=”1916” software=”XCO” swVersion=”3.4.0”] [alert@1916 resource=”/App/System/HA/Nodes/Node” alertId=”31050” cause=”lossOfRedundancy” type=”operationalViolation” severity=”minor”] [alertData@1916 node_ip=”10.1.2.4”] BOMHA degraded, node 10.1.2.4 is down.
Health Response	Response { Resource: /App/System/HA/Nodes/Node HQI { Color: Orange Value: 3 } StatusText: HA degraded, node 10.2.3.5 is down. }

HA Service (Fully Redundant)


31051	HA Service (Fully Redundant)
Description	Send an alert when the standby node is up and ready which indicates that the system is fully redundant.
Preconditions	A timer task periodically monitors the status of the nodes and raises an event to the fault management system. The fault management system raises an alert to the user to indicate that the system is fully redundant. For HA events, the polling frequency is every minute.
Requirements	Alert shows the following data: None The following example shows an alert when the standby node is up and running: <118>1 2003-10-11T22:14:15.003Z xco.machine.com FaultManager - - [meta sequenceId=”47”] [origin ip=”10.20.30.40” enterpriseId=”1916” software=”XCO” swVersion=”3.4.0”] [alert@1916 resource=”/App/System/HA/Nodes/Node” alertId=”31051” cause=”redundancyRestored” type=”operationalViolation” severity=”info”] BOMHA fully redundant
Health Response	Response { Resource: /App/System/HA/Nodes/Node HQI { Color: Green Value: 0 } StatusText: HA fully redundant. }

HA Service (Failover Occurred)


31052	HA Service (Failover Occurred)
Description	Send an alert when an HA failover has occurred.
Preconditions	A timer task periodically monitors the status of the nodes and raises an event to the fault management system. The fault management raises an alert to the user to indicate that an HA failover has occurred. For HA events, polling frequency is every minute.
Requirements	Alert shows the following data: Active IP The following example shows an alert when there is a HA failure: <114>1 2003-10-11T22:14:15.003Z xco.machine.com FaultManager - - [meta sequenceId=”47”] [origin ip=”10.20.30.40” enterpriseId=”1916” software=”XCO” swVersion=”3.4.0”] [alert@1916 resource=”/App/System/HA/Nodes/Node” alertId=”31052” cause=”localNodeTransmissionError” type=”operationalViolation” severity=”major”] [alertData@1916 active_iP=”10.1.2.3”] BOM10.1.2.3 is now the HA active node
Health Response	Response { Resource: /App/System/HA/Nodes/Node HQI { Color: Red Value: 4 } StatusText: 10.1.2.3 is now the HA active ndoe. }

Service Degraded


31053	Service Degraded
Description	Send an alert when some of the node services are not operational.
Preconditions	A timer task periodically monitors the node status and raises an event to the fault management system. The fault management system raises an alert to the user to indicate that some of the node services are not running. For service events, the polling frequency is every minute.
Requirements	Alert shows the following data: None None The following example shows an alert when some of the node services are not running: <116>1 2003-10-11T22:14:15.003Z xco.machine.com FaultManager - - [meta sequenceId=”47”] [origin ip=”10.20.30.40” enterpriseId=”1916” software=”XCO” swVersion=”3.4.0”] [alert@1916 resource=”/App/System/HA/Nodes/Services” alertId=”31053” cause=”serviceDegraded” type=”operationalViolation” severity=”warning”] BOMSome of the services are not operational.
Health Response	Response { Resource: /App/System/HA/Nodes/Services HQI { Color: Yellow Value: 2 } StatusText: Some of the services are not operational. }

Service Restored


31054	Service Restored
Description	Send an alert when all the node services are operational.
Preconditions	A timer task raises an event to the fault management system. The fault management system raises an alert to indicate to the user that some of the node services are running. For service events, the polling frequency is every minute.
Requirements	Alert shows the following data: None The following example shows an alert when all the node services are running: <118>1 2003-10-11T22:14:15.003Z xco.machine.com FaultManager - - [meta sequenceId=”47”] [origin ip=”10.20.30.40” enterpriseId=”1916” software=”XCO” swVersion=”3.4.0”] [alert@1916 resource=”/App/System/HA/Nodes/Services” alertId=”31054” cause=”serviceRestored” type=”operationalViolation” severity=”info”] BOMServices are in running state.
Health Response	Response { Resource: /App/System/HA/Nodes/Services HQI { Color: Green Value: 0 } StatusText: Services are in running state. }

9038968-00 Rev AB